home *** CD-ROM | disk | FTP | other *** search
Text File | 1989-12-29 | 30.7 KB | 1,076 lines |
- Newsgroups: comp.sources.misc
- organization: gisle@ifi.uio.no
- subject: v09i093: PEP filter program [ part 2 of 5 ]
- from: gisle@ifi.uio.no (Gisle Hannemyr)
- Sender: allbery@uunet.UU.NET (Brandon S. Allbery - comp.sources.misc)
-
- Posting-number: Volume 9, Issue 93
- Submitted-by: gisle@ifi.uio.no (Gisle Hannemyr)
- Archive-name: pep/part02
-
- # This is a shell archive [ part 2 of of 5 ]
- # Remove everything above and including the cut line.
- # Then run the rest of the file through /bin/sh (not csh).
- #--cut here-----cut here-----cut here-----cut here-----cut here-----cut here--#
- #!/bin/sh
- # shar: Shell Archiver
- # Execute the following text with /bin/sh to create the file(s):
- # Doc/pep.1l
- # This archive created: Fri Dec 29 14:42:42 1989
- # Wrapped by: Gisle Hannemyr (gisle@ifi.uio.no)
- echo shar: extracting pep.1l
- sed 's/^XX//' << \SHAR_EOF > pep.1l
- XX.\" @(#)pep.1l 2.0 89/12/10 [gh]
- XX.\" Usage:
- XX.\" nroff -man pep.1l
- XX.TH PEP 1L "28 December 1989" "Version 2.1"
- XX.SH NAME
- XXpep \- a file detergent
- XX.SH SYNOPSIS
- XX.B pep
- XX[
- XX.B \-a
- XX]
- XX[
- XX.B \-b
- XX]
- XX[
- XX.B \-c
- XX[
- XX.I size
- XX]]
- XX[
- XX.B \-d + | \-
- XX]
- XX.if n .ti +5
- XX[
- XX.B "\-e [ 0 | 1 | 2"
- XX]]
- XX[
- XX.B \-g
- XX.I file
- XX]
- XX[
- XX.B \-h
- XX]
- XX[
- XX.B \-i + | \-
- XX]
- XX.if n .ti +5
- XX[
- XX.B \-k + | \-
- XX]
- XX[
- XX.B \-m + | \-
- XX]
- XX[
- XX.B \-o
- XX[
- XX.B b
- XX]]
- XX[
- XX.B \-p
- XX]
- XX.if n .ti +5
- XX[
- XX.B \-s
- XX[
- XX.I size
- XX]]
- XX[
- XX.B \-t
- XX[
- XX.I size
- XX]]
- XX[
- XX.B \-u
- XX.I terminator
- XX]
- XX[
- XX.B \-v
- XX]
- XX.if n .ti +5
- XX[
- XX.B \-w + | \-
- XX]
- XX[
- XX.B \-x
- XX]
- XX[
- XX.B \-z
- XX]
- XX[
- XX.I filename
- XX.B .\|.\|.
- XX]
- XX.SH DESCRIPTION
- XX.LP
- XX.B Pep
- XXis a filter program to "clean" files. It is named after a
- XXpopular Norwegian detergent.
- XX.PP
- XX.B Pep
- XXmay be used to remove control characters, strip parity bits,
- XXinterpret ANSI escape sequences, compress tabulation,
- XXextract strings and convert character sets. Nine out of ten hackers
- XXprefer "pep" to soap (which may very well explain why some of
- XXthem smell the way they do).
- XX.PP
- XX.B Pep
- XXis a filter. Its default operation is to read from standard input
- XX(the keyboard) and write on standard output (the terminal).
- XX.PP
- XXYou may also specify the name of one or more files as the last
- XXargument on the command line. Most versions of
- XX.B pep
- XX(not the version compiled for the DEC VMS operating system)
- XXallow ambiguous filename arguments, were a single
- XX.I filename
- XXargument may specify several files.
- XX.PP
- XXYou may instruct
- XX.B pep
- XXto write the result back onto the original input file with the
- XX.B \-o
- XXoption. If you use this option, the original file will be lost.
- XXIf you want to keep the original file (something that usually will
- XXbe the case when you do things like extracting strings from an
- XXexecutable file), you should make a copy of the file before applying
- XX.B
- XXpep,
- XXand filter the copy rather than the original.
- XXSome of the functions in
- XX.B
- XXpep
- XX(in particular those selected with the
- XX.B \-b
- XXand
- XX.B \-s
- XXoptions) may remove a lot of material from files, and it may be unfortunate if
- XXthis happens to the wrong file. It is probably a good idea to always use
- XX.B pep
- XXon copies until you have some experience with the various
- XX.BR pep \-options.
- XXYou may also use the
- XX.B b
- XXargument on the
- XX.B \-o
- XXoption to save the original in a .BAK-file.
- XX.PP
- XXTo get a brief summary of the command line syntax and all the options,
- XXyou need to specify the
- XX.B \-h
- XXoption. Just type the command:
- XX.sp 0.5
- XX.RS
- XX.B pep \-h
- XX.RE
- XX.PP
- XXfollowed by the RETURN key. Note that just
- XX.B pep
- XXwill not give you this summary. The command:
- XX.sp 0.5
- XX.RS
- XX.B pep
- XX.RE
- XX.PP
- XXwill start
- XX.B pep
- XXas a filter, and it will just echo back whatever you type, until you
- XXtype the end of file character (usually CTRL-D or CTRL-Z).
- XX.PP
- XXWhen
- XX.B pep
- XXis running as filter, it is reading from the standard input and
- XXwriting to the standard output. In this state,
- XX.B pep
- XXwill be very much less verbose than it usually is. It will still
- XXprint error messages, but very little else. Note that while:
- XX.sp 0.5
- XX.RS
- XX.nf
- XX.B pep < foobar.in > foobar.out
- XX.B pep \-ob foobar.txt
- XX.fi
- XX.RE
- XX.PP
- XXwill do more or less the same job, the first will do it quietly,
- XXin the tradition of Unix filters; the latter will print the
- XXcopyright notice, a detailed list of the things it will do,
- XXand finally a list and line count
- XXof all the files it processes as it plods along.
- XX.PP
- XX.B Pep
- XXwill remove some "noise" from files, even if no options are specified.
- XXThe following is the default behavior:
- XX.RS
- XX.TP 3
- XX\(bu
- XXremove trailing spaces;
- XX.TP 3
- XX\(bu
- XXterminate each line with the canonical line terminator (usually LF, CR or both);
- XX.TP 3
- XX\(bu
- XXremove underlining intended for backspacing printers;
- XX.TP 3
- XX\(bu
- XXremove control characters (character codes < 32) except canonical line
- XXterminator, FF and TAB;
- XX.TP 3
- XX\(bu
- XXbreak the line before the FF if a line contains an FF anywhere except in the
- XXfirst column.
- XX.RE
- XX.PP
- XXIf you want to check what
- XX.B pep
- XXactually intend to do to your file before it does it, you may make it
- XXpause with the
- XX.B \-p
- XXoption. For example:
- XX.sp 0.5
- XX.RS
- XX.B pep \-p foobar.txt
- XX.RE
- XX.PP
- XXwill make
- XX.B pep
- XXstop after displaying a list of the conversions it will apply to the
- XXfile. The user is prompted and may choose to proceed
- XX(hitting the RETURN key), or abort
- XXthe program without doing anything (hitting CTRL-C).
- XX.PP
- XXThe user may want other conversions than the default action described
- XXabove. A number of conversion functions may be selected by specifying one or
- XXmore options on the command line.
- XX.PP
- XXSome of the options require an additional argument switch, and must be
- XXfollowed by a "+" or a "\-", other options
- XXrequire a number or a filename argument.
- XXMost of the options may be combined with other options, but a few are
- XXmutually exclusive. If the user specifies invalid options or option
- XXarguments, then
- XX.B pep
- XXwill abort with an error message and return an error exit code on
- XXoperating systems that support exit codes.
- XX.SH OPTIONS
- XX.TP
- XX.B \-a
- XXWrite out information about
- XX.B
- XXpep.
- XX.TP
- XX.B \-b
- XXRemove all characters not in the original 7-bit character set (ISO 646).
- XXI.e. remove the characters which are encoded from 128 to 255.
- XX(If this option is combined with the
- XX.B \-x
- XXoption, it will print the codes for these characters in hexadecimal
- XXinstead of removing them.)
- XXThe
- XX.B \-b
- XXoption is powerful, and may remove a lot of bytes if you use it
- XXon the wrong file. Only use it if you know exactly how the eight bit is
- XXused in the file you intend to filter. Also note that the options
- XX.B i, d, k, g, m, w
- XXor
- XX.B z
- XXin most cases are better suited to
- XXprocess files where the eight bit is set.
- XX.TP
- XX\fB\-c \fR[ \fIsize \fR]
- XXCompress space into tabulation. I.e. insert TAB characters when
- XXreplacing a run of two or more SPACE characters would produce a
- XXsmaller output file.
- XXThis function is the opposite of the function invoked with the
- XX.B \-t
- XXoption.
- XX.IP
- XXThe default tabulation size is 8,
- XXbut you may specify any other tabulation with the optional numeric
- XXargument.
- XX.TP
- XX.B \-d + | \-
- XXConvert to or from the ISO 8859/1 8 bit character set and the Norwegian
- XXversion of the ISO 646 7 bit character set. If the argument is "+",
- XXthe file is converted
- XX.I to
- XXISO 8859/1. If the argument is "\-",
- XXthe file is converted
- XX.I from
- XXISO 8859/1. The ISO 8859/1 character set is also
- XXknown as the "DEC Multinational Character Set".
- XX.TP
- XX\fB\-e \fR[ \fB0 | 1 | 2 \fR]
- XXInterpret ANSI screen control sequences (also known as ANSI ESCAPE
- XXsequences). This function makes
- XX.B pep
- XXemulate cursor positioning and other functions on an ANSI-terminal.
- XX.IP
- XX.B Pep
- XXwill complain about "strange" (i.e. implementation dependent) use of
- XXANSI escape sequences.
- XX.IP
- XX.B Pep
- XXwill normally save a screen image on the output file when one of
- XXtwo events occur: 1) When the screen is full and scrolls up;
- XXor 2) just before a screen image is erased with the "erase screen"
- XXANSI screen control sequence. In some cases important fields
- XXon the screen will be overwritten or erased. There
- XXis no good solution to this
- XXproblem, but
- XX.B pep
- XXprovides the user with some opportunity to guard against overwriting
- XXand erasure. This is done by specifying an additional numeric argument
- XXto the
- XX.B \-e
- XXoption. This numeric indicate the level of protection
- XXand is interpreted as follows:
- XX.sp 0.5
- XX.RS
- XX.RS
- XX.TP 3
- XX0:
- XXno protection \(em fields may be erased and overwritten
- XX(this is the default);
- XX.TP
- XX1:
- XXsequences that erase fields are ignored;
- XX.TP
- XX2:
- XXsequences that erase or overwrite fields are ignored.
- XX.RE
- XX.RE
- XX.TP
- XX\fB\-g \fIfile \fR
- XXRead the conversion table from a file. The name of the file must be
- XXappended as the argument to this option.
- XX.IP
- XXThe file itself is a standard ASCII text file where each line should
- XXcontain two decimal numbers. The first number is the character code
- XXto convert
- XX.I from,
- XXand the second number is the character code to convert
- XX.I to.
- XXA "#" character and all the following characters up to a NEWLINE is
- XXconsidered a comment, and is ignored. Comments are however echoed
- XXon the screen along with the other comments
- XX.B pep
- XXmakes, unless the comment line starts with a "##".
- XX.IP
- XXBelow is an example of how such a conversion file may look:
- XX.sp 0.5
- XX.PP
- XX.ft B
- XX.nf
- XX.RS
- XX.RS
- XX# Convert from Macintosh to IBM-PC
- XX##This line is not echoed on the screen.
- XX# MAC IBM
- XX 174 146
- XX 175 157
- XX 129 143
- XX 190 145
- XX 191 155
- XX 140 134
- XX# EOF
- XX.RE
- XX.RE
- XX.fi
- XX.ft R
- XX.TP
- XX.B \-h
- XXWrite a brief summary of
- XX.B pep
- XXoptions, and exit.
- XX.TP
- XX.B \-i + | \-
- XXConvert to or from the IBM 8 bit character set (Code Page 850 Multilingual)
- XXand the Norwegian
- XXversion of the ISO 646 7 bit character set. If the argument is "+",
- XXthe file is converted
- XX.I to
- XXCP 850. If the argument is "\-",
- XXthe file is converted
- XX.I from
- XXCP 850. The CP 850 character set (or a subset of it)
- XXis what is used in the IBM PC, AT, and PS/2 series of
- XXcomputers and their clones. Note that some machines with
- XXAmerican PROMs have a yen- and cent character in
- XXthe position rightfully belonging to upper and lower case
- XXversions of the Norwegian character
- XXwritten as an "o" with a slash across it (often referred to as
- XX.IR oslash ).
- XX.TP
- XX.B \-k + | \-
- XXConvert to or from a 8 bit character set and the
- XXISO 646 7 bit character set. This is a modified version
- XXof the
- XX.B \-i
- XXfunction, hacked to preserve both the
- XX.I backslash
- XXcharacter and the upper case
- XX.I oslash
- XXcharacter as required by, among others, the "KnowledgeMan" package. These
- XXcharacters share the same code (92 decimal) in 7 bit ISO 646,
- XXbut uses different codes (92 is backslash, 157 is oslash) in
- XX8 bit CP 850. To get around this, two backslashes in ISO 646
- XXwill be converted to the upper case oslash character in CP 850, while
- XXa single backslash will be preserved \(em and vice versa.
- XX.IP
- XXIf this option is combined with the
- XX.B \-d
- XXor
- XX.B \-m
- XXoption, the DEC/ISO or the Macintosh character sets is used as base
- XXinstead of CP 850.
- XX.TP
- XX.B \-m + | \-
- XXConvert to or from the Apple Macintosh 8 bit character set and the Norwegian
- XXversion of the ISO 646 7 bit character set. If the argument is "+",
- XXthe file is converted
- XX.I to
- XXthe Macintosh character set; if the argument is "\-",
- XXthe file is converted
- XX.I from
- XXthe Macintosh character set.
- XXSee description of
- XX.B \-v
- XXoption below and
- XXnote in "bugs" section below about treatment of "end-of-line" and
- XX"end-of-paragraph".
- XX.TP
- XX\fB\-o \fR[ \fBb \fR]
- XX.B Pep
- XXwill usually write the result of conversions on the standard output
- XX.I (stdout).
- XXThis option instead instructs
- XX.B pep
- XXto replace each named input file with a file containing the result
- XXof filtering the file through
- XX.B pep.
- XXIf the option is augmented with the argument
- XX.B b
- XX(i.e.
- XX.BR \-ob ),
- XXthen
- XX.B pep
- XXwill create a backup copy of the original input file on a file
- XXwith extension .BAK. If you just specify
- XX.B \-o
- XXthe original file is deleted.
- XX.IP
- XXThe VMS version of
- XX.B pep
- XXwill always run as if this option was specified. This is because
- XXVMS does not support useful redirection or pipes. Therefore, it is never
- XXnecessary to specify the
- XX.B \-o
- XXoption under VMS, but users should still specify
- XX.B \-ob
- XXif they want a backup copy of the original input file.
- XX.TP
- XX.B \-p
- XXWrite out a brief description the conversion functions that
- XXwill be activated by the current
- XXset of options, and pause. The user may review the list of
- XXconversion functions and abort (by hitting CTRL-C) if they do not have
- XXthe intended effect.
- XX.TP
- XX\fB\-s \fR[ \fIsize \fR]
- XXFind strings in extremely "noisy" files.
- XX.IP
- XX.BR Pep 's
- XXconcept of a string is that it is a sequence of "printable" characters
- XXof a certain length. The default minimum length of this sequence is
- XX4, but this may be changed by the user by supplying an optional
- XXnumeric argument that becomes the minimum length of the sequence.
- XX.IP
- XXThe default definition of a "printable" character is a symbol with
- XXencoding above 31 decimal (i.e. 32 to 255) plus certain
- XXcommon control characters (TAB, CR and LF). This definition
- XXis almost always too liberal, and will include a lot of "noise" in
- XXthe output. One or more of the options
- XX.B \-b, \-d, \-i, \-m
- XXor
- XX.B \-z
- XXshould be specified in addition to
- XX.B \-s
- XXin order to narrow the definition and the search space.
- XXIn my experience, the
- XX.B \-b
- XXoption is a particularly
- XXuseful additional filter when searching for strings.
- XX.TP
- XX\fB\-t \fR[ \fIsize \fR]
- XXExpand tabulation, replacing the TAB character with a suitable number
- XXof spaces. The default tabulation size is 8, but the optional
- XXnumeric argument
- XX.I size
- XXmay be used to set tabulation to any desired size.
- XX.TP
- XX\fB\-u r | n | s | - | # | \fInumber \fR
- XX.BR Pep 's
- XXdefault behaviour is to terminate lines with whatever is the
- XXcanonical line terminator (the standard way to terminate
- XXa text line) on the assumed target system for the output file.
- XXThis means CR/LF on a microcomputer system, LF on a UNIX system,
- XXand CR if the target is a Macintosh). The assumed target system
- XXis usually the system
- XX.B pep
- XXis running on, unless you request folding to the character set
- XXof another computer system. Then, that computer system becomes
- XXthe assumed target.
- XX.IP
- XXThe
- XX.B \-u
- XXoption allows you to override this assumption.
- XXYou do this by specifying explicit (in decimal) the numeric ASCII
- XXvalue of the end of line character you want in your output file.
- XXFor example, to make sure
- XXlines are terminated by LF (the standard for UNIX text files),
- XXyou may use
- XX.BR \-u10 ,
- XXbecause 10 is the ASCII value of the newline (LF) control character.
- XXInstead of a numeric argument, you may specify
- XX.BR r ,
- XXfor carrige return (CR),
- XX.BR n ,
- XXfor newline (LF),
- XX.BR s ,
- XXfor record separator (RS), the symbol
- XX.BR - ,
- XXfor no line terminator, or the symbol
- XX.B #
- XXto get carrige return followed by a newline (CR/LF).
- XX.TP
- XX.B \-v
- XXNormally,
- XX.B pep
- XXwill terminate each line with the canonical line terminator.
- XXSome typesetting programs and word processors, however, require
- XXthat no hard line terminator is present within a paragraph, and
- XXthat only paragraphs are hard terminated. If you want to
- XXimport a file to such a typesetting program or word processor,
- XXyou may instruct
- XX.B pep
- XXto terminate paragraphs
- XX.I only
- XXwith this option.
- XX.IP
- XXSee note in "bugs" section below about treatment of "end-of-line" and
- XX"end-of-paragraph".
- XX.TP
- XX.B \-w + | \-
- XXThis slightly obsolete option converts files to and from the
- XXWordStar version 3.2 "document" mode. If the argument is "+",
- XXthe file is converted
- XX.I to
- XXWordStar document mode; if the argument is "\-",
- XXthe file is converted
- XX.I from
- XXWordStar document mode into plain ASCII text.
- XX.TP
- XX.B \-x
- XXExpand unprintable characters. This option
- XXwill make
- XX.B pep
- XXexpand the characters it would otherwise remove from the file by
- XXprinting the character encoding of these characters in
- XXhexadecimal between angle brackets.
- XX.TP
- XX.B \-z
- XXZero the eight bit (a.k.a. the parity bit) on all characters in the file.
- XX.SH ENVIRONMENT
- XX.PP
- XX.B Pep
- XXknows a single environment variable:
- XX.BR PEP ,
- XXwhich may be
- XXused to indicate the lookup path for files with conversion
- XXtables. Below is some examples on how to set this in some
- XXoperating systems:
- XX.sp 0.5
- XX.RS
- XX.nf
- XX\fBset PEP=c:\eusr\elib \fR(MS-DOS)
- XX\fBsetenv PEP /usr/local/lib \fR(UNIX)
- XX\fBdefine PEP "DISK_USR:<LOCAL.LIB>" \fR(VMS)
- XX.fi
- XX.RE
- XX.PP
- XXThe command to set this environment variable should usually be
- XXpart of the command file that is read during login (this may
- XXbe named
- XX.B "AUTOEXEC.BAT, LOGIN.COM, .profile"
- XXor
- XX.B .login
- XXdepending upon your choice of operating system. Please note
- XXthat environment variables do not exist under CP/M.
- XX.SH EXAMPLES
- XXSome of the examples below use i/o redirection and pipes,
- XXas indicated with the symbols ">" and "<" (redirection)
- XXand "|" (pipe symbol). These examples
- XXonly apply to operating systems that support
- XXredirection and pipes.
- XX.PP
- XX.TP 3
- XX.B pep \-h
- XXPrint a quick summary of all available options, and exit.
- XX.TP
- XX.B "pep"
- XXRead input from standard input (the keyboard), and write
- XXthe result on standard output (the screen) until the user
- XXtypes the end of file character (usually CTRL-D (UNIX) or
- XXCTRL-Z (MS-DOS)). This is of limited practical use by
- XXitself, usually this command is inserted into the middle of a
- XXcommand where the standard input and standard output are pipes.
- XX.TP
- XX.B "pep < foo.bar
- XXDisplay a slightly cleaned-up version of the file
- XX.I foo.bar
- XXon the screen.
- XX.TP
- XX.B "pep < foo.bar > foo.txt"
- XXRead the file
- XX.I foo.bar,
- XXclean it, and write the result on the file
- XX.I foo.txt.
- XX.TP
- XX.B "pep foo.bar > foo.txt"
- XXRead the file
- XX.I foo.bar,
- XXclean it, and write the result on the file
- XX.I foo.txt.
- XX.TP
- XX.B "pep foo1.bar foo2.bar > foo.txt"
- XXRead the files
- XX.I "foo1.bar"
- XXand
- XX.I foo2.bar,
- XXclean them, and
- XXcatenate the result on the file
- XX.I foo.txt.
- XX.TP
- XX.B "pep \-o foo.fil bar.fil"
- XXClean the files
- XX.I foo.fil
- XXand
- XX.I bar.fil,
- XXreplacing the
- XXoriginal files with the cleaned-up versions.
- XX.TP
- XX.B "pep \-ob foo.fil bar.fil"
- XXClean the files
- XX.I foo.fil
- XXand
- XX.I bar.fil,
- XXreplacing the
- XXoriginal files with the cleaned-up versions. The original
- XXfiles are preserved as
- XX.I foo.bak
- XXand
- XX.I bar.bak.
- XX.TP
- XX.B "pep \-i+ \-o program.dok"
- XXConvert the Norwegian text in the file
- XX.I "program.dok"
- XXto use
- XXthe IBM-PC 8 bit character set. Please note that this
- XXconversion may not be 100 percent correct. For instance,
- XXthe pipe symbol "|" will be converted to the lower case Norwegian
- XX.I oslash
- XXcharacter.
- XXThis is because the pipe symbol and the character share the
- XXsame ASCII code (124) in the Norwegian version of the 7-bit character
- XXset, but they have different codes when
- XXusing 8-bit character sets.
- XX.TP
- XX.B "pep \-e2 \-o kermit.log"
- XXInterpret ANSI screen control sequences in the file
- XX.I kermit.log.
- XXSet guard to level 2 (no deletion or overwriting).
- XX.IP
- XXIn this example, it is assumed that the file
- XX.I kermit.log
- XXis a log record of an on-line session with some Bulletin Board System (BBS).
- XXSuch files may be created with the command "log session" in the popular
- XX.I kermit
- XXcommunication program. Most other communication programs have
- XXsimilar commands. Many BBSs uses
- XXuses ANSI sequences for simple graphics, highlighting and
- XXother special effects, and you will get a much more
- XXmore readable session log if you run it through
- XX.B pep
- XXwith the
- XX.B \-e
- XXoption turned on.
- XX.TP
- XX.B "test | pep \-e > test.scr"
- XXRun the program
- XX.I test,
- XXand pipe its output to
- XX.B pep,
- XXwhich interprets any ANSI sequences and store the resulting screen
- XXimages in the file
- XX.I test.scr.
- XXNote that this is only
- XXpossible on operating systems that support pipes (i.e. UNIX and MS-DOS).
- XX.IP
- XXThe screen images will now be on standard text files which have the same
- XXgeneral layout as the original screen images. This may be useful if
- XXyou need text versions of the screen images for inclusion in manuals or
- XXfor prototypes.
- XX.TP
- XX.B "nroff \-man \-Tlpr pep.1l | pep > pep.doc"
- XXGenerate a plain text version of this manual, without
- XXbackspaces or double strikes
- XX.RB ( nroff
- XXis the standard Unix text formatter).
- XX.TP
- XX.B "pep \-d- \-o *.txt"
- XXConvert all files with extension
- XX.B .txt
- XXfrom DEC/ISO character set to Norwegian 7-bit ASCII characters.
- XX.TP
- XX.B "pep \-gibm2mac \-ur \-< foo.ibm > foo.mac"
- XXUse the conversion table in the file
- XX.I "ibm2mac"
- XXto convert
- XXthe character set in the file
- XX.I foo.ibm.
- XXStore the result on the file
- XX.I foo.mac,
- XXwhere each line should be terminated by a single CR character.
- XX.TP
- XX.B "pep \-m\- < foo.mac | pep \-i+ > foo.ibm"
- XXConvert Apple Macintosh encoded Norwegian characters in the file
- XX.I "foo.mac"
- XXto IBM-PC (Code Page 850) encoding. This is an alternative way to
- XXaccomplish the same thing as the conversion done in the previous
- XXexample.
- XX.TP
- XX.B "pep \-w- \-o *.*"
- XXConvert all files in the current directory from WordStar document
- XXmode to 7-bit ASCII.
- XX.TP
- XX.B "pep \-w+ \-t4 < foo.txt > foo.ws"
- XXConvert the file
- XX.I "foo.txt"
- XXto WordStar document mode format, also expanding tabulation (tabstop = 4)
- XXto space characters. The result is stored on a file named
- XX.I foo.ws.
- XX.B Pep
- XXuses a simple pattern recognition mechanism to recognize pages,
- XXparagraphs, soft white space and soft hyphens. It will probably
- XXnot do a 100% conversion, but the file will be much easier to
- XXedit in WordStar than the original.
- XX.TP
- XX.B "pep \-z \-x < foo.dat > foo.dmp"
- XXStrip the 8th bit and expand control characters to hex
- XXdigits in the file
- XX.I foo.dat,
- XXand store the result on the file
- XX.I foo.dmp.
- XX.IP
- XXExpanding the unprintable characters to hexadecimal makes it easier to
- XXinspect a file in an ordinary text editor, and to post-process it
- XXby a customized filter you may create yourself
- XXwith the search/replace and macro
- XXfacilities found in many editors today.
- XX.TP
- XX.B "pep \-s6 \-b < pep.exe"
- XXExtract "strings" from the file
- XX.I pep.exe.
- XXThe strings are just listed on standard output (the screen).
- XX"Strings" are in this context assumed to be any sequence of characters
- XXthat are at least 6 characters long. The
- XX.B \-b
- XXoption excludes characters with codes in the range 128 to 255 from
- XXthe search. It is almost always a good idea to combine the
- XX.B \-b
- XXoption with
- XX.B \-s
- XXoption, otherwise to much garbage is picked up by the filter.
- XX.TP
- XX.B "pep \-t4 \-c8 \-o foo.c"
- XXIf both tab expansion
- XX.B \-t
- XXand tab compression
- XX.B \-c
- XXis specified, then
- XX.B pep
- XXwill repack the tabulation. This is useful if you want to convert
- XXa file from one tab-size to another (e.g. to convert non-standard
- XX4 character tabulation into standard 8 character tabulation).
- XXIn this example, two TAB characters in the file
- XX.I foo.c
- XXare replaced by a single tab character: and any TAB character that cannot be
- XXpaired up is replaced by the appropriate number of spaces.
- XX.TP
- XX.B "pep \-t \-c \-o foo.c"
- XXRemove redundant space characters in existing tabulation in the file
- XX.I foo.c.
- XXWhat happens is that tabulation on each line is first expanded and
- XXthen compressed again, which effectively
- XXremoves any space characters "inside" a tabulation.
- XX.SH DIAGNOSTICS
- XX.PP
- XXIf you specify an option that
- XX.B pep
- XXdoes not recognize, then
- XX.B pep
- XXwill
- XXwrite a summary of usage and abort. Other errors on the
- XXcommand line will result in
- XX.B pep
- XXwriting an error message
- XXbefore aborting.
- XX.PP
- XXOn operating systems that support exit codes,
- XX.B pep
- XXwill return an exit code upon termination.
- XX.PP
- XXIf
- XX.B pep
- XXis interpreting ANSI escape sequences and notices
- XXsyntactical or semantical errors in the way they are used, a
- XXwarning is printed on the screen, prefixed with the string
- XX"ansi:". This means that it is also possible to use
- XX.B pep
- XXto check if programs use ANSI sequences in a portable way.
- XX.SH FILES
- XX.TP 10
- XX.B pep, pep.exe, pep.cmd
- XXexecutable file (actual name depends upon which operating system you use).
- XX.TP
- XX.B mac2ibm
- XXsmall example of a user supplied conversion table
- XXto convert from the Macintosh character set to that used on
- XXthe Norwegian version of the original IBM-PC (the sample file
- XXonly covers the Norwegian characters \(em to complete it is
- XXleft as an exercise to the reader :-) ).
- XX.TP
- XX.B ibm2mac
- XXinverse of
- XX.B mac2ibm:
- XXconversion table from a small subset of
- XXIBM CP 850 to Macintosh character set.
- XX.TP
- XX.B ebc2ns7
- XXconversion table from the IBM EBCDIC character set to the Norwegian
- XXversion of the ASCII 7-bit character set (ISO646 NS4551).
- XX.TP
- XX.B ibm2ro8
- XXconversion table from the IBM-PC 8-bit character
- XXset to Hewlett-Packard ROMAN8.
- XX.TP
- XX.B ro82ibm
- XXinverse of
- XX.B ibm2ro8:
- XXconversion table from ROMAN8
- XXto IBM-PC character set.
- XX.TP
- XX.B ibm2iso
- XXconversion table from the IBM-PC CP 850 8-bit character
- XXset to ISO 8859/1.
- XX.TP
- XX.B iso2ibm
- XXinverse of
- XX.B ibm2iso:
- XXconversion table from ISO 8859/1 to CP 850.
- XX.SH AUTHOR
- XX.PP
- XXCopyright \(co 1989 Gisle Hannemyr.
- XX.PP
- XX.B Pep
- XXmay be freely distributed and copied, as long as this file
- XXis included in the distribution and that these statements
- XXabout authorship and copyright is not altered or removed.
- XX.PP
- XXBug reports, improvements, comments, suggestions and flames to:
- XX.ti +0.2i
- XXSnail: Gisle Hannemyr, Brageveien 3A, 0452 Oslo, Norway.
- XX.ti +0.2i
- XXEmail: gisle@nr.uninett (EAN);
- XX.ti +0.9i
- XXgisle@ifi.uio.no (Internet);
- XX.ti +0.9i
- XX\|.\|.\|.\|!mcvax!ifi!gisle (UUCP);
- XX.ti +0.9i
- XX(and several BBS mailboxes).
- XX.SH ACKNOWLEDGMENTS
- XX.PP
- XXThanks to Robert Andersson, for the SYS-V
- XX.I "rename"
- XXfunction; and to
- XXKnut Borge, Bjoern Larsen, Knut Omang and Geir-Harald Strand,
- XXfor elucidation of the unspeakeable mysteries of VMS.
- XXSpecial thanks are due Inge Arnesen for finding and fixed a bug,
- XX(and to Nils-Eivind Naas for bringing it to my attention).
- XX
- XXSeveral people have contributed ideas and/or bug reports.
- XXIn addition to those mentioned above,
- XXOla Garstad, Ottar Grimstad,
- XXTor Sjoewall, and Jens-Henrik Soerensen
- XXshould be mentioned. My apologies if anyone
- XXis forgotten.
- XX.SH SEE ALSO
- XX.LP
- XX.BR dd (1),
- XX.BR detex (1L),
- XX.BR convert (VMS),
- XX.BR expand (1),
- XX.BR od (1V),
- XX.BR strings (1),
- XX.BR tr (1),
- XX.BR unexpand (1).
- XX.PP
- XX.BR Detex (1L)
- XXis a lex-based program to convert LaTex and TeX manuscripts into plain
- XXASCII text. It is available from the author upon request. Those marked
- XXVMS are standard VMS utilities. The others are standard UNIX utilities.
- XX.SH BUGS
- XX.PP
- XXThere is a very strong Norwegian bias in
- XX.B pep.
- XXIn particular,
- XXthere exists several national versions of the ISO 646 7-bit
- XXcharacter set; but all built-in functions to convert between this
- XXand various 8-bit character sets (i.e.
- XX.B \-d, \-i, \-k
- XXand
- XX.BR \-m )
- XXbluntly assumes the standard Norwegian version of the ISO 646. For
- XX.B pep
- XXto work with other national 7-bit character sets, the
- XXcompiled in conversion tables (type FOLDMATRIX for those who read the
- XXsource code) need to be extended.
- XX.PP
- XXThe VMS version of
- XX.B pep
- XXruns with the
- XX.B \-o
- XXoption permanently enabled. This is because VMS does not support an
- XXuseful i/o redirection or pipe mechanism.
- XX.PP
- XXThe VMS Record Management Service (RMS) knows of several record formats.
- XXYou can see what record format a file is by using the VMS DCL command
- XX.I "DIRECTORY/FULL"
- XXand examine the field "Record format".
- XXOn VMS systems,
- XX.B Pep
- XXwill always generate output files with record format set to "Stream_LF",
- XXbut some programs may require that the output file is in other
- XXformats. To fix this, it might be necessary to run the output of
- XX.B pep
- XXthrough the VMS
- XX.B CONVERT
- XXutility. Please see the DEC VMS manuals for details.
- XX.PP
- XXThe Macintosh "text only" format uses the carriage return (CR) character
- XX(ASCII 13) as terminator. Most text processors (e.g. MacWrite)
- XXseems capable of handling two conventions:
- XXOne is to use CR to terminate each line (and two or more
- XXconsequtive CR's between paragraphs); the other is to use CR between
- XXparagraphs only.
- XX.B Pep
- XXis also capable of handling both conventions. The default behaviour
- XXis to terminate each line, but the
- XX.B \-v
- XXoption may be used to terminate paragraphs only.
- XXPlease note that
- XX.B pep
- XXuses a rather simplistic heuristic to identify the end of a paragraph,
- XXit bluntly assumes that paragraphs are separated by blank lines.
- XX.PP
- XXIf you use the
- XX.B \-o
- XXoption, then the original input file will
- XXbe overwritten. Before you are familiar with
- XX.B pep,
- XXyou may
- XXfind that it sometimes removes more material than you expect
- XXfrom a file. It may be a good idea to always make a copy
- XXof the original file before you start experimenting with
- XX.B pep,
- XXor you may add the
- XX.B
- XX"b"
- XXargument to the
- XX.B
- XX\-o
- XXoption
- XX.B
- XX(\-ob).
- XX.PP
- XXThe built-in IBM-PC, DEC and Macintosh conversion tables
- XXconverts to and from the Norwegian version of 7-bit "ASCII"
- XXcharacters. You should use the
- XX.B \-g
- XXoption and "general" conversion tables for all other purposes.
- XX.PP
- XX.B Pep
- XXonly knows the ANSI sequences implemented in the
- XXstandard MS-DOS console driver
- XX.I
- XXANSI.SYS.
- XX.PP
- XXThere cannot be a space character between an option and the
- XXoption's argument (e.g. you'll have to use
- XX.B
- XX"\-gfoo.bar",
- XXnot
- XX.B
- XX"\-g foo.bar").
- XX.PP
- XXPep will only filter "regular" files. It will skip directories, sockets
- XXand "special" files.
- XX.PP
- XXLinks are the GOTOs of file systems. If you run a hard linked file
- XXthrough pep using the
- XX.B \-o
- XXoption, the link will not be preserved. Pep will just skip soft
- XXlinked files.
- XX.PP
- XX.B Pep
- XXsearches for the conversion tables requested with the
- XX.B
- XX\-g
- XXoption in the following order: first the current directory,
- XXthen the directory of the file
- XX.I PEP.EXE
- XX(MS-DOS only), and finally the directory pointed to by the
- XX.B PEP
- XXenvironment
- XXvariable.
- XX.PP
- XX.B Pep
- XXknows nothing about the COFF-format and the
- XX.B \-s
- XXoption is
- XXprimitive compared to the UNIX command
- XX.IR strings (1).
- XXSo if you are on a UNIX-system \(em forget about the
- XX.B \-s
- XXoption and use
- XX.IR strings (1)
- XXinstead.
- XX.PP
- XX.B Pep
- XXwill not convert Word Perfect documents into plain ASCII.
- XXThis much requested function is, however, built into Word Perfect.
- XXIt is named "store as DOS-text" and is activated by pressing
- XXCTRL-F5 (at least in Word Perfect 4.2).
- XX.\" EOF
- SHAR_EOF
- if test 28373 -ne "`wc -c pep.1l`"
- then
- echo shar: error transmitting pep.1l '(should have been 28373 characters)'
- fi
- # End of shell archive
- exit 0
-
-